Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures
نویسندگان
چکیده
Predicting biological properties of unseen proteins is shown to be improved by the use protein sequence embeddings. However, these embeddings have caveat that metadata do not exist for each amino acid, in order measure quality unique learned embedding vector separately. Therefore, current cannot intrinsically evaluated on degree their captured information a quantitative manner. We address this drawback our approach, dom2vec, learning representation domains and acid base, as domain To perform reliable intrinsic evaluation terms biology knowledge, we selected related most distinctive characteristics domain, which are its structure, enzymatic, molecular function. Notably, dom2vec obtains an adequate level performance assessment—therefore, can draw analogy between local linguistic features natural languages structure function architectures. Moreover, demonstrate applicability prediction tasks, comparing it with state-of-the-art three downstream tasks. show outperforms toxin enzymatic comparable cellular location prediction.
منابع مشابه
Evolution of protein domain architectures.
This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evol...
متن کاملExploring protein domain structure.
The protein databank contains coordinates of over 10,000 protein structures, which constitute more than 25,000 structural domains in total. The investigation of protein structural, functional and evolutionary relationships is fundamental to many important fields in bioinformatics research, and will be crucial in determining the function of the human and other genomes. This review describes the ...
متن کاملcompactifications and function spaces on weighted semigruops
chapter one is devoted to a moderate discussion on preliminaries, according to our requirements. chapter two which is based on our work in (24) is devoted introducting weighted semigroups (s, w), and studying some famous function spaces on them, especially the relations between go (s, w) and other function speces are invesigated. in fact this chapter is a complement to (32). one of the main fea...
15 صفحه اولProtein function annotation using protein domain family resources.
As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protei...
متن کاملLength Variations amongst Protein Domain Superfamilies and Consequences on Structure and Function
BACKGROUND Related protein domains of a superfamily can be specified by proteins of diverse lengths. The structural and functional implications of indels in a domain scaffold have been examined. METHODOLOGY In this study, domain superfamilies with large length variations (more than 30% difference from average domain size, referred as 'length-deviant' superfamilies and 'length-rigid' domain su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2021
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a14010028